day 17:
希望瀏覽數可以多點啦,更多人看我的教學後有所增長
自然語言工具包 (NLTK) 是一個流行的 Python 庫,用於處理人類語言資料。它為諸如標記化、詞幹提取、標記、解析等任務提供了各種工具和資源。
pip install nltk
import nltk
nltk.download()
它們對使用者輸入的句子進行簡單的模式匹配,並以自動生成的句子進行回應。
import nltk
nltk.chat.eliza.eliza_chat() #執行
執行結果:
Therapist
---------
Talk to the program by typing in plain English, using normal upper-
and lower-case letters and punctuation. Enter "quit" when done.
========================================================================
Hello. How are you feeling today?
>
https://www.nltk.org/api/nltk.chat.html
NLTK 中的函數word_tokenize()
使用預先訓練的詞語進行分拆,該分詞器應用各種規則和啟發式方法將輸入文字拆分為單字。它接受一個文字字串作為輸入並傳回一個標記列表。
範例:
import nltk
nltk.download('punkt') #下載punkt模組
text = "NLTK is a powerful library for AI chat bot"
important_words = nltk.word_tokenize(text) #執行
print(important_words)
執行結果:
[nltk_data] Downloading package punkt to
[nltk_data] C:\Users\leung\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
['NLTK', 'is', 'a', 'powerful', 'library', 'for', 'AI', 'chat', 'bot']
sent_tokenize()
函數是Python中NLTK(自然語言工具包),將輸入文字拆分為句子。它接受一個文字字串作為輸入並傳回一個句子列表。
import nltk
# nltk.download('punkt')
#這段文章是由Poe生產出
text = """
The sun rose in the clear blue sky, casting its warm rays upon the vibrant green landscape. Birds chirped their melodious tunes as a gentle breeze rustled the leaves of the trees. The air was filled with the sweet scent of blooming flowers. People began to emerge from their homes, greeting the day with a sense of anticipation and purpose. Some embarked on their daily routines, heading to work or school, while others sought adventure and exploration. Children played joyfully in the park, their laughter echoing through the air. It was a day brimming with possibilities, promising new experiences and memories to be made.
"""
sentences = nltk.sent_tokenize(text) #sentences is list
for sentence in sentences: #用迴圈輸出所有句子
print(sentence)
執行結果:
The sun rose in the clear blue sky, casting its warm rays upon the vibrant green landscape.
Birds chirped their melodious tunes as a gentle breeze rustled the leaves of the trees.
The air was filled with the sweet scent of blooming flowers.
People began to emerge from their homes, greeting the day with a sense of anticipation and purpose.
Some embarked on their daily routines, heading to work or school, while others sought adventure and exploration.
Children played joyfully in the park, their laughter echoing through the air.
It was a day brimming with possibilities, promising new experiences and memories to be made.
pos_tag()
將單字輸入並傳回一個值,那是代表他的詞性。
下面有一個列表解釋值的意思
值 | 意思 |
---|---|
NNP | 專有名詞 |
VBZ | 動詞第三人稱 |
JJ | 形容詞 |
想查看更多的話可以到下列網站參考
https://blog.csdn.net/JasonJarvan/article/details/79955664
範例:
import nltk
# nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
text = "NLTK is a powerful library in Python."
tokens = nltk.word_tokenize(text) #執行
pos_tags = nltk.pos_tag(tokens)
print(pos_tags)
執行結果:
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] C:\Users\leung\AppData\Roaming\nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('in', 'IN'), ('Python', 'NNP'), ('.', '.')]
使用ps.stem()
將單字縮減為其基本形式或字根形式。詞幹提取是一個透過刪除後綴或前綴來幫助標準化單字的過程。以下是如何使用NLTK函式庫執行詞幹擷取的範例:
# import these modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
# choose some words to be stemmed
words = ["running", "jumps", "jumping", "played", "playing", " program", "programs", "programmer", "programming", "programmers"]
for w in words:
print(w, " : ", ps.stem(w)) #執行
執行結果:
running : run
jumps : jump
jumping : jump
played : play
playing : play
program : program
programs : program
programmer : programm
programming : program
programmers : programm
大家可以自行嘗試,今天的內容到這裏,如果覺得我的文章對你有幫助或有更好的建議,可以追蹤我和不妨在留言區提出,我們明天再見。
reference
https://www.nltk.org/api/nltk.chat.html
https://poe.com
https://blog.csdn.net/JasonJarvan/article/details/79955664